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Equations: Part I 
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Abstract 

Research on asymptotic model selection in the context of stochastic differential equations (SDE’ s) 
is almost non-existent in the literature. In particular, when a collection of SDE ’s is considered, the 
problem of asymptotic model selection has not been hitherto investigated. Indeed, even though the 
diffusion coefficients may be considered known, questions on appropriate choice of the drift func¬ 
tions constitute a non-trivial model selection problem. 

In this article, we develop the asymptotic theory for comparisons between collections of SDE ’s 
with respect to the choice of drift functions using Bayes factors when the number of equations (indi¬ 
viduals) in the collection of SDE ’s tend to infinity while the time domains remain bounded for each 
equation. Our asymptotic theory covers situations when the observed processes associated with the 
SDE ’s are independently and identically distributed (rid), as well as when they are independently 
but not identically distributed (non -iid). In particular, we allow incorporation of available time- 
dependent covariate information into each SDE through a multiplicative factor of the drift function; 
we also permit different initial values and domains of observations for the SDE’ s. 

Our model selection problem thus encompasses selection of a set of appropriate time-dependent 
covariates from a set of available time-dependent covariates, besides selection of the part of the drift 
function free of covariates. 

For both iid and non -iid set-ups we establish almost sure exponential convergence of the Bayes 
factor. 

Furthermore, we demonstrate with simulation studies that even in non-asymptotic scenarios 
Bayes factor successfully captures the right set of covariates. 

Keywords: Bayes factor consistency; Kullback-Leibler divergence; Martingale; Stochastic differ¬ 
ential equations; Time-dependent covariates; Variable selection. 


1 Introduction 


In statistical applications where “within” subject variability is caused by some random component vary¬ 
ing continuously in time, stochastic differential equations (SDE’ s) have important roles to play for 
modeling the temporal component of each individual. The inferential abilities of the SDE ’s can be 
enhanced by incorporating covariate information available for the subjects. In these time-dependent 
situations it is only natural that the available covariates are also continuously varying with time. Exam 


pies o f statistical applica t ions o f SDE -based models with time-dependent covariates are lOravecz et al. 


(2011), Overg aard et al. (2005), L eande r et al. < 20151) . the first one also considering the hierarchical 
Bayesian paradigm. 

Unfortunately, asymptotic inference in systems of SDE based models consisting of time-varying 
covariates seem to be rare in the statistical literature, in spite of their importance. So far random ef¬ 
fects SDE models have been considered for asymptotic inference, without covariates. We refer to 


Delattre et al. (2013) for a brief review, who also undertake theoretical and classical asymptotic investi¬ 


gation of a class of random effects models based on SDE’ s. Specifically, they model the i-tli individual 
by 

dXi(t) = b(Xi(t ), cfi)dt + a(Xi(t))dWi(t), (1.1) 


where, for i = 1,... ,n, Xf O) = x l is the initial value of the stochastic process X,(t), which is as¬ 
sumed to be continuously observed on the time interval [0, Tj]; T, > 0 assumed to be known. The 
function b(x, ip), which is the drift function, is a known, real-valued function on M x (M is the real 
line and d is the dimension), and the function a : M i-a M is the known diffusion coefficient. The 
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SDE' s given by (II.lb are driven by independent standard Wiener processes {Wj(-); i = 1,... , n}, 
and {cj)f, i = 1 ,... , n}, which are to be interpreted as the random effect parameters associated with 
the n individuals, which are assumed by Delattre et al. ( 20131) to be independent of the Brownian mo¬ 
tions and independently and identica lly distributed (iid) r andom var iables with some common distri ¬ 
buti on, For the sake of convenienc e Delattre et al.\ ( 2013b (see also Maitra and Bhattacharva ( 2016c ) 
and Maitra and Bhattacharva (2015)) assume b(x, (pi) = 0,1)(x). Thus, the random effect is a multi¬ 
plicative factor of the drift function. In this article, we generalize the multiplicative factor to include 
time-dependent covariates. 

In the case of SDE- based models, proper specification of the drift function and the associated prior 
distributions demand serious attention, and this falls within the purview of model selection. Moreover, 
when (time-varying) covariate information are available, there arises the problem of variable selection, 
that is, the most appropriate su bset from the set of ma ny available covariat es needs t o be c hosen. As is 
well-known (see, for example, Kass and Rafterv (1995)), the Bayes factor (Jeffreys (1961)) is a strong 
candidate for dealing with complex model selection problems. Hence, it is natural to consider this crite¬ 
rion for model selection in SDE set-ups. However, dealing with Bayes factors directly in SDE set-ups 
is usually infeasible due to unavailability of closed form expressions, and hence various numerical ap¬ 
proximations based on Markov chain Monte Carlo, as well as related criteria such as Akaike Information 
Criterion ( Akaike d 1973 )) and B a yes Informat ion Criterion ( Schwarz ( 1978 )). are generally employed 
(see, for example, Fuchs (2013), Iacus (2008)). But quite importantly, although Bayes factor and its 
variations find use in general SDE models, in our knowledge covariate selection in SDE set-ups has 
not been addressed so far. 

Moreover, asymptotic theory of Bayes factors in SDE contexts, with or without covariates, is still 
lacking (but see ISivaganesan and Ling ha m (2002) who asymptotically compare three specific diffusion 
models in single equation set-ups using intrinsic and fractional Bayes factors). In this paper, our goal 
is to develop an asymptotic theory of Bayes factors for comparing different sets of SDE models. Our 
asymptotic theory simultaneously involves time-dependent covariate selection associated with a mul¬ 
tiplicative part of the drift function, in addition to selection of the part of the drift function free of 
covariates. The asymptotic framework of this paper assumes that the number of individuals tends to 
infinity, while their domains of observations remain bounded. 

It is important to clarify that the diffusion coefficient is not associated with the question of model 
selection. Indeed, it is already known from Rob e rts and Strainer ( 20011) that when the associated con¬ 
tinuous process is completely observed, the diffusion coefficient of the relevant SDE can be calculated 
directly. Moreover, two diffusion processes with different diffusion coefficients are orthogonal. Conse¬ 
quently, we assume throughout that the diffusion coefficient of the SDE's is known. 

We first develop the model selection theory using Bayes factor in general SDE based iid set-up; note 
that the iid set-up ensues when there is no covariate associated with the model and when the initial values 
and the domains of observations are the same for every individual. The model selection problem in iid 
cases is essentially associated with the choice of the drift functions with no involvement of covariate 
selection. We then extend our theory to the non -iid set-up, consisting of time-varying covariates and 
different initial values and domains of observations. Here model selection involves not only selection of 
the part of the drift functions free of the covariates, but also the subset of important covariates from a set 
of available covariates. 

Specifically, we prove almost sure exponential convergence of the relevant Bayes factors in our 
set-ups. Assuming the iid set-up we develop our asymptotic theory based on a general result already 
existing in the literature. However, for the non-iid situation we first develop a general theorem which 
may perhaps be of independent interest, and prove almost sure exponential convergence of the Bayes 
factor in our non-iid SDE set-up as a special case of our theorem. 

It is important to note that (which we also clarify subsequently in Section 12.61) . that in the asymp¬ 
totic framework of this paper, where the domains of observations remain bounded for the individuals, 
incorporation of random effects does not make sense from the asymptotic perspective. For this reason 
we include random effects in our paper Maitra and Bhattacharva ( 2016bl) . where we assume that even 
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the domains of observations arc allowed to increase indefinitely. 

The rest of our article is structured as follows. In Section [2] we formalize the problem of model 
selection in our aforementioned asymptotic framework. We then present the necessary assumptions and 
results in Section^ In Section[4]we investigate convergence of the Bayes factor when the SDK models 
being compared form an iid system of equations. In Section[5]we develop a general asymptotic theory of 
Bayes factors in the non -iid situation, and then in Section [6] we investigate exponential convergence of 
the Bayes factor when the system of SDE ’s are non-Ud. In Section [7] we demonstrate with simulation 
studies that Bayes factor yields the correct covariate combination for our SDE models even in non- 
asymptotic cases. We provide a brief summary of this article and make concluding remarks in Section 


The proofs of our lemmas and theorems are provided in the supplementary document Maitra and Bhattacharva 


(2016dl) , whose sections will be referred to in this article by the prefix “S- 


2 Formalization of the model selection problem in the SDE set-up 

Our assumptions (H2') in Section[3]ensure that our considered systems are well defined and we are able 
to compute the exact likelihood. We consider the filtration (J r / W ,t> 0), where J 7 /' = <r(Wj(s), s < t ). 
Each process W t is a (Ej v , t > 0)-adapted Brownian motion. 

In connection with model selection we must analyze the same data set with respect to two different 
models. So, although the distribution of the underlying stochastic process under the two models are 
different, to avoid notational complexity we denote the process by 2Q (t) under both the models, keeping 
in mind that the distinction becomes clear from the context and also by the model-specific parameters. 

2.1 The structure of the SDE models to be compared 

Now, let us consider the following two systems of SDE models for i = 1,2 ,,n: 

dXD) = ^ 0 {t)b^ x i{t))dt + a{Xi{t))dWi{t) (2.1) 

and 

dXi(t) = fa^WbfriXiMdt + a (Xi(t))dWi(t) (2.2) 

where, 2Q(0) = x 1 is the initial value of the stochastic process Xj (t), which is assumed to be continu¬ 
ously observed on the time interval [0,7)]; T t > 0 for all i and assumed to be known. We assume that 
(12. 1 h represents the true model and (12.21) is any other model. In the above equations, for j = 0,1, £ • and 
[3,j denote the sets of parameters associated with the true model and the other model. 

2.2 Incorporation of time-varying covariates 

We model it) for j = 0,1, as 

<&,£,•(*) = 0*(t)) = £o j + £ijgi(zn(t)) + £, 2 j 92 (z i2 (t)) h -b £ P jg P (z ip {t)), (2.3) 

where^j = £ij, £ P j) is a set of real constants for j = 0,1, and Zi(t) = (zn(t), Zi 2 {t ),..., Zi P (t)) 
is the set of available covariate information corresponding to the /-th individual, depending upon time t. 

We assume Zi(t) is continuous in t, zu(t) € Z\ where Z\ is compact and gi : Z\ —> R is continuous, for 
l = 1,... ,p. We let Z = Z i x - • -xZ p , and 3 = {z(t) € Z : t € [0, oo) such that z(t) is continuous in t}. 
Hence, Zi £ 3 for all i. The functions bp. are multiplicative parts of the drift functions free of the co¬ 
variates. 


3 




2.3 Model selection with respect to the drift function and the covariates 

We accommodate the possibility that the dimensions of (3 0 ,(3 1 , associated with the drift functions, may 
be different. In reality, bp 0 may be piecewise linear or convex combinations of linear functions, where 
the number of linear functions involved (and hence, the number of associated intercept and slope param¬ 
eters) may be unknown. That is, not only the values of the components of the parameter /3 0 , but also the 
number of the components of (3 0 may be unknown in reality. In general, bp Q may be any function, linear 
or non-linear, satisfying some desirable conditions. Linearity assumptions may be convenient, but need 
not necessarily be unquestionable. In other words, modeling bp Q in the SDK context is a challenging 
exercise, and hence the issue of model selection in this context must play an important role in the SDE 
set-up. 

We also accommodate the possibility that £ 0 and £ ls associated with (j>i£ 0 and 4>i£ 1 , may be coef¬ 
ficients associated with different subsets of the available set of p covariates. This has important impli¬ 
cation from the viewpoint of variable selection. Indeed, in a set of p time-dependent covariates, all the 
covariates are unlikely to be significant, particularly if p is large. Thus, some (perhaps, many) of the 
coefficients £/ 0 associated with the true model must be zero. This means that only a specific subset of 
the p covariates is associated with the true model. If a different set of covariates, associated with £ l5 
is selected for actually modeling the data, then the Bayes factor is expected to favour the true set of 
covariates associated with £ 0 . 

If two different models are compared by the Bayes factor, none of which may be the true model, 
then the Bayes factor is expected to favour that model which is closest to the true model in terms of the 
Kullback-Leibler divergence. 


2.4 Form of the Bayes factor 

For j = 0,1, letting 6j = (/3 ■, £ ■), we first define the following quantities: 




/ 

J 0 


° 2 (Xi(s)) 


dXi(s), Vi 


i,0j 


f 

Jo 


Ti 4%A.(8)tf3.( X i(s)) 


-ds 


(2.4) 


for j = 0,1 and i = 1,..., n. 

Let Cp i denote the space of real continuous functions (x(t), t E [0, T,.]) defined on [0, T t ], endowed 
with the (7-field Cp associated with the topology of uniform convergence on 0. 7) . We consider the 
distribution Pj” 7 *’ 2 ' 1 on (Cp ■ Cp) of (Xj(t). t E [0, T t ]) given by (12.11) and (12.21) for j = 0,1. We 
choose the dominating measure Pj as the distribution of (12.1b and (12.21) with null drift. So, for j = 0,1, 


dP; 




dPi 


fi,Oi (Xi) = exp U h o 



(2.5) 


where f t p 0 (X,) denotes the true density and (X,) stands for the other density associated with the 
modeled SDE. 

Let 0 = 23 x r be the parameter space on which a prior probability measure of 0 \ . which we denote 
by 7r(0i), is proposed. In the set-up where n —)> oo and Ti are given, we arc interested in asymptotic 
properties of the Bayes factor, given by, /q = 1 and for n > 1, 

I n = [ P n (0iMd0i), (2.6) 

J® 


as to —>• oo, where 


Rn(0 1) 


n 


hoAXi) 

fiMXiY 
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2.5 The iid and the non -iid set-ups 

Note that, for iid set-up 6j = ((3j, £oj), along with x l = x and T\ = T for all i. Since, for the iid set-up 
= £oj, so, in this case T = R. Thus, here the problem of model selection reduces to comparing 
£oo&/ 3 0 with Zoibfa using Bayes factor. 

In the non-iid set-up we relax the assumptions £1 j = = • • • = i p j = 0 and x l = x,Ti = T for 

each i. Hence, in this case, the model selection problem involves variable selection as well as comparison 
between different drift functions. 


2.6 No random effects when T t are given 


It is important to perceive that when the Tj are fixed constants, it is not possible to allow random effects 
into the model and still achieve consistency of the Bayes factor. This is because in that case the SDE 
set-up would simply reduce to n independent models, each with independent sets of parameters, leaving 
no scope for asymptotics since Tj are held constants. In Maitra and Bhattacharya (2016b) we consider 
random effects when T, —> oo along with n —> oo. 

2.7 A key relation between U i ^ j and in the context of model selection using Bayes 
factors 

An useful relation between U^g. and V l g j which we will often make use of in this paper is as follows. 

rTi fa&Wbpj (M 8 )) 


Uifi, = 


L 


aHXi(s)) 

Ti Ki{ X A S )) 


L 


Ti <t>i&As)<t>i& 0 {s) b f3 (Xi(s)) bp Q (. Xi(s )) 


dXi(s) 

[fato&bo ( x ii s )) ds + cr (. Xi(s )) d,Wi{s)\ 


— Vi,O 0 ,Oj + 


l 


(Ms)) 

Ti fc&Wbi (Ms)) 


ds + 


l 


Ti ( M s )) 


<r( x i( 8 )) 


dWi(s) 


dWi(s), 


with 


Vi,e o,e 


o <r(M 8 )) 

Ti (Ms))bf3 0 (Ms)) 


S 


ds. 


a* (X^s)) 

Note that Vi t e 0 = Vi ; g 0; g 0 and Vi : e 1 = Vi t g 1: g 1 . Also note that, for j = 0,1, for each i. 


E e 0 


L 


Ti MjMdj (Ms)) 

a (Ms)) 


dWAs) 


= 0, 


(2.7) 


( 2 . 8 ) 


(2.9) 


so 


that E 0o (U t ^ :i ) — Eg 0 (Vifiofi,)- 


3 Requisite assumptions and results for the asymptotic theory of Bayes 
factor when n —)• oo but T* are constants for every i 

We assume the following conditions: 

(HI') The parameter space © = 03 x T such that 95 and T are compact. 

(H2') For j = 0,1, bp.(-) and a(-) are C 1 on R and satisfy b'^ (x) < I\i(l + x 2 + ||/3 ; || 2 ) and 

cr 2 (x) < K )(I + X 2 ) for all x <G M, for some K\. /\2 > 0. Now, due to (HI ') the latter boils down 
to assuming b 2 ^ (x) < K( 1 + x 2 ) and cr 2 (x) < K (1 + x 2 ) for all i£l, for some K > 0. 
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Because of (H2') it follows from Theorem 4.4 of Maqj (|201l[), page 61, that for all T, > 0, and any 
k> 2 , 


where 


. s£[0,Tj] 


E I sup |2Q(s)| fc I < (1 + 3 fc - 1 £?|X i (0)r ) exp ( i)T t ) , 


(3.1) 


^ = -(18 K)*T- 


T- 2 + 


k 3 


2{k — 1 ) 


We further assume: 


(H3') For every x, bp. (x) is continuous in (3j, for j = 0,1. 

(H4') For j = 0,1, 

+ ^ + (3.2) 

where Kp. is continuous in (3 j . 

(H5 7 ) (i) Let 2 = Zi x Z 2 x x Zpbe the space of covariates where Zi is compact for l = 1,... ,p 
and Zi(t) = ( zn(t ), Zi 2 {t),... ,Zi P (t)) G Z for every i = 1 ,,n and t G [0, Tf\. Moreover, 
Zi(t ) are continuous in t, so that z* € 3 for every i. 

(ii) For j = 0,1, the vector of covariates Zi(t) is related to the i-th SDK of the )-th model via 


p 

= Co j + 

1=1 


where, for l = 1,... . p, gi : Z/ —>• M is continuous. Notationally, for a given z(t), we denote 
hj (t) = hj ( 2 ( t )) = Coj + Ef =1 Cij9i{z{t)). 

(iii) For l = 1,... ,p, and for t € [0,7)], 


1 n 

- ^2gi(zu(t)) ci(t)- 


(3.3) 


i =1 


and, for l,m= 1,... ,p; t G [0, T)], 

1 n 

-y^gi{zii(t))g m (z im (t)) ci(t)c m (t), (3.4) 

n 

1=1 

as n —>• 00 , where c/(i) are real constants. 

Note that, given l and t, had zu(t) been random and iid with respect to i, then (13.31 ) would hold 
almost surely by the strong law of large numbers. Additionally, if zu it) and z rm (t) were independent, 
then (13.41) would hold almost surely as well. Flence, in this paper, one may assume that for i = 1,..., n, 
and l = 1 ,... ,p, the covariates zu are observed realizations of stochastic processes that are iid for i = 
1 ,..., n, for all l = 1,,p, and that for l 7 ^ m, the processes generating zu and z im are independent. 
Thus, in essence, we assume here that for l / m, gi{zu(t)) and gm(z irn (t)) are uncorrelated. 

We then have the following lemma, which will be useful for proving our main results. 

Lemma 1 Assume (HI’) - (H4 r ). Then for all 6\ G 23 x T,for k > 1, 

Eo 0 [Ui,e 3 ] k < 00 ; j = 0 , 1 , 

Eq 0 Wi,e if < 00 , 

Eg o [V it g^ gj ] k < 00 ; j = 0,1. 
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(3.5) 

(3.6) 

(3.7) 









Moreover, for j = 1, the above expectations are continuous in 6 


4 Convergence of Bayes factor in the SDE based iid set-up 

We first consider the iid set-up; in other words, we assume that x l = x, T) = T for i = 1,..., n, and 
j = 0,1. In this case Oj = (/3j,£oj) for j = 0,1. We shall relax these assumptions subsequently when 
we take up the non -iid (that is, independent, but non-identical) case. 


4.1 A general result on consistency of Bayes factor in the iid set-up 


To inves tigate consistency of the Bayes factor, we resort to a general result in the iid set-up developed by 
Walke n (2004) (see also Walker et al. (2004|)). To state the result we first define some relevant notation 
which apply to both parametric and nonparametric problems. For any x in the appropriate domain, let 


fn(x ) = / f(x)lt n (df) 


be the posterior predictive density, where 7r n stands for the posterior of /, given by 

,,, LUt,lf(XMdf) 

1 JE./KM df) 

and let 

fnAix) = J f(x)n nA (df) 

be the posterior predictive density restricted to the set A , that is, for the prior probability it (A) > 0, 


TtnA(df) 


I A (f)*n(df ) 

f A *n(df) 


where I a denotes the indicator function of the set A. 


Clearly, the above set-up is in accordance with the iid situation. The following theorem of Walker 


(2004) is appropriate for our iid set-up. 


Theorem 2 (Walker (2004)) Assume that 


tt(f ■ £(/,/o) < Cl) > o, (4.1) 

only for, and for all c± > 5, for some 5 > 0, and that for all e > 0, 

lim inf /C (Vo, fnA(e)) > C ( 4 - 2 ) 


when A(e) = 
referred to as 


{/ : /(' (/o, /) > e|. Property ( 14. /D is the Kullback-Leibler property and d4.2D has been 
the Q* property by Walker 1 200^) . Assume further that 


Then, 


almost surely. 


sup Var 

n 



< OO. 


n 1 log (/„) -> -5, 


(4.3) 


(4.4) 


The following corollary provides the result on asymptotic comparison between two models using Bayes 
factors, in the iid case. 
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Corollary 3 (Walker d2004h ) Let R n {f) = flLi 


For j = 1, 2, let 


I jn = J Rn(fUj(df), 

where tt\ and 1 T 2 are two different priors on f. Let B n = I \ n J 1 2 n denote the Bayes factor for comparing 
the two models associated with 7Ti and n 2 ■ If vri and 112 have the Kullback-Leibier property ( 14.71 ) with 
5 = 5\ and 5 = 5 2 respectively, satisfy the Q* property (14.21 ). and ( 14.41 ) with I n = Ij n ,for j = 1,2, then 

n _1 log B n -y 5 2 - 5i, 


almost surely. 


Remark 4 In Walker (2004) the densities are assumed to be dominated by the Lebesgue measure. How¬ 
ever, this is not necessary. The results remain true if the densities are with respect to any valid measure; 
see, for example, Barron et al. ( 79991 ) for related concepts and results (Lemma 4 in particular) with 
respect to general measures. As such, in our SDE-based situation, although the densities are not dom¬ 
inated by the Lebesgue measure (see ( 12.51) ), all our results still remain valid. 


4.2 Verification of Theorem[2]in iid SDE set-up 


In our parametric case, ff = fo 0 and / = fo ,. In this iid set-up, as mentioned earlier = £ 0 j for 
j = 0,1, so that <j)£ = £0 j. For our convenience, we let, for j = 0,1 and i = 1,..., n, 




bffiMs)) 

* 2 (Xi(s)) 


dXi(s), 





bf lo (X i (s))bp.(X i (8)) 

-- ds 


(4.5) 


Note that, for i = 1 ,n, V r yi (l = and V,ji ] = ■ The Kullback-Leibier divergence 

measure between /o and / in this set-up is given, with i = 1, by 


X(fe 0 , fo-i) 


- 7y - Ee 0 ( Vi , P 0 ) ~ ?oo^oi-E , 0 o (Vi,/3 Oi1 3 1 ) 


+ 


-Fe 0 (Vi,p 


where Eg 0 = Ej 0 ^. The result easily follows from (12.71) and (12.91 ). Now let 


6 = ndf 1 X (feoJoi) 

= mm | ^EeffVy^) - ^oiEgffVy^) + ^ 0 (V 1>/3l )} . 


(4.6) 


(4.7) 


Since Eo (l ( V) ) and E(j l} ( V] yi (y p x ) are continuous in /3 1 , compactness of © guarantees that 0 < 5 < 

00. 


4.2.1 Verification of (14.11 ) 

To see that (14.11 ) holds in our case for any prior dominated by Lebesgue measure, first let us define 

X* (4, fe 1 )=lC (fe 0 ,/*)-£ {fo 0 , f w ) , (4.8) 

where f-y = argmin X (fo Q , fof). Now, let us choose any prior ir such that yf = o where o is a 
B 

continuous positive density with respect to Lebesgue measure, where, by “positive” density, we mean a 
density excluding any interval of null measure. For any c\ > 0, we then need to show that 


7T [0 X e 0 : 5 < 7C(/ 0O , f 6l ) < 5 + ci) > 0, 


















for any prior it dominated by Lebesgue measure. This is equivalent to showing 


7T (0! G © : 0 < /C*(/g, f 01 ) < ci) >0, 


for any prior it dominated by Lebesgue measure. 

Since K.{fg„ fg x ) is continuous in 0 1 , so is /C *(/q, fg i). Compactness of 0 ensures uniform conti¬ 
nuity of K.* ( fg, fg 1 ). Hence, for any ci > 0, there exists e ci independent of 0 1 , such that 110 1 — 011 < e ci 
implies /C*(/g, f 01 ) < a. Then, 


7T 


(01 G © : 0 < £*(/«> f 01 ) < ci) > 7T (0! G 0 : ||0i - 0|| < e Cl ) 


> inf n(0i) X ^ ({0i G © : ||0i — 0|| < e Cl }) > 0, 

where v stands for Lebesgue measure. In other words, (14.1b holds in our case. 


(4.9) 


4.2.2 Verification of (14.2b 

To see that (14.2b also holds in our SEE set-up, first note that in our case 

t , , f A (e)f0i( X M d6 0 

JnA(e)\ x ) r f m \ ’ 

J A (e) 7r n( d ®t) 

with 


(4.10) 


H(e) = {0iG©:/C(/ 0o ,/ 01 )>e} 

= {01 G 0 : k>E 0o (V li/3o ) - HooZoiEg 0 (V 1M ) + ^-Eg 0 (V 1A ) > e} (4.11) 

for any e > 0. Note that, here we have replaced K(fg,, fg ,) > e with K(fg 0 , f 0l ) > e in the definition 
of A(e) because of continuity of the posterior of 0i. Note that 

fnA { e)(X)< SUp f 8l (X) = fg l(x) (X), (4.12) 

0 iG A(e) 

where 6\(X), which depends upon X, is the maximizer lying in the compact set A(e). Now note that 


X(fg 0 , f nA (e )) = Eg, [log fg, (X)] - Eg, log f nA{e) (X) 


>Eg 0 [log fg 0 (X)]- Eg, 
feo(X) \ 


fo l{X) (X) 


= Eg, log 


4 ( x)W, 


~ E e 1 (x)\g 0 E x\g 1 (x)=g,g 0 lo S 


fo 0 (X) 


f{0!(X)=0}( X \ 


~ Eg 1 (X)\g 0 ^-(f e oi M 
> Eft inf K,{ft 

= -^ , 0i(X)|6»o^'^ 6 ’o> /#*) 

= X(fg,,f^*) 


> e, 


(4.13) 
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where d* = arg min K,{fg 0 , f&). Hence, (14.21) is satisfied in our SDE set-up. 

tfeA(e) 

4.2.3 Verification of (14.31) 

We now prove that (14.31) also holds. It is straightforward to verify that 

In +1 fn+ l(^n+l) 

TT “ fe 0 (Xn+l) ’ 

where 

/n+i(') = E ei i Xl ,...,x n [fe i(-)] 

is the posterior predictive distribution of f 01 (•), with respect to the posterior of 6\, given X\, , X n . 
In ( 14.151) . E Sl \ xlr ..,x n denotes expectation with respect to the posterior of B\ given X \,..., X n . 

First note that, since 


(4.14) 

(4.15) 


log [fe 0 (X n+1 )] = £oo£4 +1 ,/3 0 - *fV n +i,0 o , (4-16) 

it follows from LemmaQjthat the moments of all orders of log [fg 0 (X n+ 1 )] exist and are finite. Also, 
since Xj are iid, the moments are the same for every n = 1,2,.... In other words, 

sup Var (log f 0Q (X n+ i)) < oo. (4.17) 

n 

Then observe that for any given X n+ \, using compactness of 0 and continuity of fg t (X n+ \) with 
respect to 6\, 

foux n+ i)(X n +l) = inf feAXn+l) < fn+i{X n +i) < sup f 01 (X n+ 1 ) = fg**( Xn+1 )(X n +i), 

Vie& 0ie© 

where 6\{X n+1 ) = arg ruin f 01 (X n+l ) and 6\*(X n+] ) = arg max f 01 (X n+1 ). Clearly, 6\{X n+l ), d\* [X n+l ) <G 
0lG© 0 \ G© 

0, for any given X n+ i. Moreover, 


ei(x n+1 ) = (pxXn+iM&iXn+i)), et*(x n+1 ) = (p?(x n+ 1 ),ffi(x n+1 ))) 

where each component of 6\{X n+ 1 ) and ()*{*(X n+ \) depends on X n+ \. Noting that (7 n+1 0 *( x n+1 ) = 

Qi(X n+1 )U n+WXn+1 ) and V n+1 ,e*(x„ +1 ) = Uot( x n+i )} 2 V n+h/3 * i(Xn+1 ), it follows from the above 
inequality that 


Itt I ^1+1,01 (.Xn+l) 

- \Un+l,0* 1 (X n+1 )\ -^- 

z TT V n+h g* {Xn+1 ) 

S u n +i,gi(x n+1 ) - 2 - 

< log fn+l(X n+1 ) 


< U n+l,0T(X n+ i) ' 

< \Un+i,oi*(x n+1 ) 


Vn+l,Ot*(X n+1 ) 

2 

Vn+l,Ol*(X n+1 ) 

2 


Hence, E 0O (log f n+1 (X n+1 )\ lies between E 0O (\U n+h0 ^ Xn+1 ) | + n+1 ’ e i (X "+ l) 


and 


E e 0 I \Un+l,9l*(X n+1 ) \ + 




+i,e**(x n+1 ) 
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We obtain uniform lower and upper bounds of the above two expressions in the following manner. 
For the upper bound of the latter we first take supremum of the expectation with respect to X n+ i, 
conditional on 0**(Jf n+ 1 ) = over <, G 0. and then take expectation with respect to X n+ \. Since 
<; £ 0, compactness of 0 and Lemma Q] ensure that the moments of any given order of the above 
expression is uniformly bounded above. Analogously, we obtain a uniform lower bound replacing the 
supremum with infimum. In the same way we obtain uniform lower and upper bounds of the other 
expression. The uniform bounds on the second order moments, in turn, guarantee that 


sup Var 

n 


(log f n+ 1 {X n+ 1 j) 


< oo. 


(4.18) 


Combining (14.171) and (14. 1 8b and using the Cauchy-Schwartz inequality for the covariance term associ¬ 
ated with Var (log/ n+1 (X ri+ \') — log fo 0 (X n+ \)'j shows that (14.31) holds in our set-up. 

We formalize the above arguments in the form of a theorem in the SDE based iid set-up. 

Theorem 5 Assume the iid case of the SDE based set-up and conditions (HI 1 ) - (H4 r ). Then ( 14.41) 
holds. 

The following corollary in the iid SDE context is motivated by Corollary [3] 

fo (XT') 

Corollary 6 For j = 1,2, let Rj n (6j) = niLi fj tx-y w ^ ere an< ^ are two different finite sets 
of parameters, perhaps with different dimensionalities, associated with the two models to be compared. 
For j = 1,2, let 

= j RUOj^iWi), 

where nj is the prior on 6 j. Let If , = I\ n J bin as before. Assume the iid case of the SDE based 
set-up and suppose that both the models satisfy conditions (HI 1 ) - (H4 ’) and have the KuIIback-Leibler 
property with 4 = 4] and 5 = 82 respectively. Then 


n 1 log B n 82 - <5i, 


almost surely. 


5 General asymptotic theory of Bayes factor in the non -iid set-up 

In this section, we first develop a general asymptotic theory of Bayes factors in the non-fid set-up, and 
then obtain the result for the non-iid SDE set-up as a special case of our general theory. 


5.1 The basic set-up 


We assume that for i = 1 ,,n, Xi ~ /o*, that is, the true density function corresponding to the i-th 
individual is /oj. Considering another arbitrary density fi for individual X, we investigate consistency 
of the Bayes factor in this general non-iid set-up. For our purpose we introduce the following two 
properties: 


1. Kullback-Leibler (8) property in the non-iid set-up: 


We denote the Kullback-Leibler divergence measure between /o* and f, by /C(/o,. fi) and assume that 
the limit 


IC°° (f 0 ,f) 


lim — 

n—>00 n 




i= 1 


log 


foi(Xy 


lim — 

n—>00 Tl 


i =1 


(5.1) 


exists almost surely with respect to the prior n on /. Let the prior distribution n satisfy 


7r 


/:mf/C(/oi,/i) >4 


= 1 , 


(5.2) 


11 










for some 5 > 0. Then we say that ir has the Kullback-Leibler (6) property if, for any c > 0, 

77 (f : S < JC°° (/ 0 , f) < 5 + c) > 0. (5.3) 

2. Q* property in the non -iid set-up: 

Let us denote the posterior distribution corresponding to n observations by 7r n . We denote n(dfi, d/ 2 ,, df n 
by 7T (df). For any set A, 


7 T n (A) - 

denotes the posterior probability of A. Let 


m=iMXiMdf) 


i?n(/t,/2,...,/n)=n 


fi( Xi) 


Let us define the posterior predictive density by 


fn(X n ) = / f n (X n )ir n (df n ), 


and 


fnA(X n ) = J fn(X n )TT nA (df n ) 

to be the posterior predictive density with posterior restricted to the set A, that is, for vr(A) > 0, 

_ (rtf \ _ I A(fn)'Kn(df n ) 

TtnAydj n ) r (jf \ 

Ja 71 n\djn) 

Then we say that the prior has the property Q* in the non-iid set-up if the following holds for any e > 0: 


when 


liminf JC(f 0n Jn,A n (e)) > L 


A n{() = {f n : X(fonJn) > «}• 


(5.4) 

(5.5) 


Let /q = 1 and for n > 1. let us define 


4 = / Rn(fl,f2,---,fn)Adf), 


(5.6) 


which is relevant for the study of the Bayes factors. Regarding convergence of I n , we formulate the 
following theorem. 


Theorem 7 Assume the non-iid set-up and that the limit 45. 7 1 ) exists almost surely with respect to the 
prior it. Also assume that the prior ir satisfies Q, has the Kullback-Leibler (6) and Q* properties 
given by 45.3D and 45.4D . respectively. Assume further that 


and 


Then 


sup E 

i 



kijxM 2 

fi(Xi ) _ 


< 00 


sup E 

n 



4—i. 


< 00. 


n 1 log 4 -a- -6 , 


(5.7) 


(5.8) 

(5.9) 
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almost surely as n —>• oo. 


Corollary 8 For j = 1, 2, /<?? 



• • ,fn)Ttj (df), 


where tt\ and Ti 2 are two different priors on f. Let B n = Iin/ffn denote the Bayes factor for comparing 
the two models associated with tt\ and n- 2 . If both the models satisfy the conditions of Theorem^ and 
satisfy the Kullback-Leibler property with S = dt and 5 = 62 respectively, then 


n 1 log B n d 2 ~ <5i, 


almost surely. 


6 Specialization of non -iid asymptotic theory of Bayes factors to non -iid 
SDE set-up where d] are constants for every i but n -A oc 


In this section we relax the restrictions T = T and x l = x for i = 1 ,... ,n. In other words, here 
we deal with the set-up where the processes ; i = 1, ..., n, are independently, but no t identically 
distributed. Following iMaitra and Bhattacharval (l2016ch . iMaitra and Bhattacharval (1201 51) we assume 
the following: 


(H6') The sequences {Ti, T 2 ,...} and {x 1 , x 2 ....} are sequences in compact sets T and X, respectively, 
so that there exist convergent subsequences with limits in T and X. For notational convenience, we 
continue to denote the convergent subsequences as {Ti, T 2 ,...} and {x 1 , x 2 ,..Let us denote 
the limits by T°° and x°°, where T°° € T and x°° € X. 


Remark 9 Note that the choices of the convergent subsequences {Ti, T 2 ,...} and { x 1 , x ' 2 ,...} are not 
unique. However, this non-uniqueness does not affect asymptotic selection of the correct model via 
Bayes factor. Indeed, as will be evident from our proof, for any choice of convergent subsequence, the 
Bayes factor almost surely converges exponentially to the correct quantity. The reason for this is that we 
actually need to deal with the infimum of the Kullback-Leibler distance over X and T, which is of course 
independent of the choices of subsequences; see Section \ 6 J\f or the details. 


Following Maitra and Bhattacharva ( 2016c!') . we denote the process associated with the initial value 
x and time point t as X(t, x), so that X(t, x l ) = X i(t), and X{ = (2Q(t); t E [0, Ti]}. 

Let Oj = {J3j, £j) for j = 0,1 denote the set of finite number of parameters, where /3 ■ and £ ■ have 
the same interpretation as in the iid set-up. As before, Zi(t) = (zn(t), z^if ),... , Zi P (t )) is the set of 
covariate information corresponding to i-th individual at time point t. For x’ E X, T, E T, zft ) E Z 
and 0; e 0, let 


U x yTi,zi, 0 j 


Vxipi.Zi.Bo.Oj 


l 

l 


Ti fatjWbjiXifax*)) 

--—- dXi(s,x ); 

offXi(s,x 1 )) 

(g)0i,g o (s)bp. (Xj(s, x l ))bp Q (Xj(s, X 1 )) 
a 2 (X i (s,x i )) 


ds. 


( 6 . 1 ) 

( 6 . 2 ) 


As before, V x i TuZ . g 0 — V x i tTi , Zi ,o 0 ,o 0 an d ^x i ,T i ,z i ,e 1 — ^x i ,T i ,z i ,e 1 ,e 1 - 

In this non-iid set-up /oj = fg 0 x i ,T i ,z i an 6 h = fe 1 ,x i ,T l .z i ■ An extension of Lemma[l]incorporating 
x, T and z shows that moments of U Xi T, z , 0 p Vr, 7 .z.o r V x ,T,z,e o , 0 j °f a K orders exist, and are continuous 
in x, T, z, 6 \. Formally, we have the following lemma. 
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Lemma 10 Assume (HI') - (H6 l ). Then for all x € X, T € T, z € Z and Q\ € ®,for k > 1, 

£00 [C4,T,z,eJ fe < oo; j = 0,1, (6.3) 

-Eflo [KT,T,z,0i] fc < OO: (6-4) 

Eo 0 \yx,T,z,e o , 0 j\ k < oo ;j = 0,1. (6.5) 

Moreover, the above expectations are continuous in (x , T, z, 6 \). 

In particular, the Kullback-Leibler distance is continuous in x, T, z and 0\. The following lemma 
asserts that the average of the Kullback-Leibler distance is also a Kullback-Leibler distance in the limit. 


Lemma 11 The limiting average lim 4 Y^l=i ^{fe 0 ,x k ,T k ,z k > fot ,x k ,T k ,z k ) ‘ s a ^ so a Kullback-Leibler 
distance. 


Even in this non -iid context, the Bayes factor is of the same form as (12.61) : however, for j = 0,1, 
U x \Ti,zi,f 3 £- ar| d V x i t Ti,zi,i 3 are not identically distributed for i = 1 Next, we establish 

strong consistency of Bayes factor in the non -iid SDE set-up by verifying the sufficient conditions of 
Theorem [71 


6.1 Verification of (15.21) and the Kullback-Leibler property in the non -iid set-up 
Firstly, note that in our case, 

K°° (/o, /) = K°° (fe 0 , fef ), (6.6) 

where the rightmost side, as asserted by Lemma [TIJ clearly exists almost surely with respect to 0\ and 
is also continuous in 6 \. 

Now note that compactness of X, T and Z along with continuity of the function fa and K',(fo 0 , x _r.z- f()\ ,x,r,z) 
with respect to x, T and z implies 


— inf _ K(fo 0 ,x,T,z, fo!,x,T,z) 

xgX, Tex, zeZ 


inf / 

cex, Tex, zeZ Jo 


rT U\ o {z{s)) 


E0 O (Vr,/3 0 ( s )) 


-4 , £ o ( z ( s )) ( l ) £ 1 ( z ( s ))E0 o (Vx,p o ,i3 1 (s)) + 


<4 (*(*)) 


E 0 O (V X i/3l (s)) \ ds 


inf 

cex, Tex, z(s(T))eZ 




Eo 0 (V X} p 0 (s(T))) 


-^(z(S(r)))^(z(i(Il))4( V xM M t ))) + 


^(sco)) 


E eo (V x ,p 1 (s(T))) > , 


(6.7) 


by the mean value theorem for integrals, where s(T ) £ [0,T] such that the above equality holds. Also 
note that the expression in (16.71 ) is continuous in T since originally the integral on [0, T] is continuous in 
T. Now note that if |T — T\ < 5i(e) such that \fa (z(s(T))) — (z(s(T)))| < | due to continuity in 

T and if |z(s(T)) — z(s(T))| < 62 (e) such that | f^ j (z(s(f))) — fa. (z(s(T)))| < | due to continuity 
of fa. in z, then \fa. ( z(s(T ))) - fa. (z(s(f)))| < \fa. (z(S(T))) - fa. (z(s(T)))\ + \fa. (. z(s(T ))) - 
(z(s(T)))| < e, showing that 0^(z(s(T))) is continuous in T and z(s(T)), which also belong to 
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compact spaces. Hence, from (16.71) it follows that 


iP(0 1 ) = T*(0 1 ) 






-^ o (z*(0 1 )(5(T*(0 1 ))))^ 1 (z*(0 1 )(S(T*(0 1 ))))£; eo (^ ( , l);/ 3 oA (5(T*(0 1 )))) 
S (z*(0 1 )(5(T*(0 1 )))) 

+—-5-^ o (^ (ei) A( s '( r *( 0 i)))) > ( 6 - 8 ) 


where x*(0i) G X, T*{0\) G T, z*(0i)(s(T*(0i))) G S depend upon #i. Then, considering the 
constant correspondence function 7 (^ 1 ) =1x1x2, for all G 0, we note that 7 is both upper and 
lower hemicontinuous (hence continuous), and also compact-valued. Hence, Berge’s maximum theorem 
(Berge (19631) ) guarantees that (16.81) is a continuous function of 6 \. 

Because of continuity of ip(6 1 ) in 6 1 , the set {6 \ : ip{0 1 ) > <5} is open and can be assigned any 
desired probability by choosing appropriate priors dominated by the Lebesgue measure. That is, we 
can assign prior probability one to this set by choosing appropriate priors dominated by the Lebesgue 
measure. Now, because of the inequality 


7 r 


(#! : inf K. (fe 0tX i tTi ,zi, fe U x*,Ti,zi) > ^ ( 0 i : ^(#i) > 8), 


and since we choose 7 r such that 7 r (6 1 : ip{0\) > <5) = 1, it follows that 

7 T [o, : inf K {fe 0 ,xi,T i ,z i Je 1 ,xi,T i ,z i ) > S 

satisfying (15.21) . 

The Kullback-Leibler property of the Lebesgue measure dominated 7 r easily follows from continuity 

of IC 00 (/ 00 ,/eJ in e 1 . 



6.2 Verification of the Q* property in the non-mi set-up 

Observe that in this situation, for any e > 0, 

A n(e) = {fn ■ V (/ 0n , f n ) > e} 

= {#1 : & (f9 0 ,x n ,T n ,z„, f9i,x n ,T n ,z n ) > e} 


Then note that 


fnA n (e)(X) < SUp f 01 ,x n ,T n ,z n (X) 
@1 EA n (e) 


f, 


(X,x n ,T„,z n ) 


(X), 


(6.9) 
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where Q\(X, x n ,T n , z n ), which depends upon X , x n . T n , z n , is the maximizer lying in the compact set 
A n (e). Now, 


fc(fo 0 ,x n ,Tn,z„, fnA„(e)) = E g 0 [ lo g fo 0 ,x n ,T n ,z„ (V)] ~ Eg 0 log f n A n (e)( X ) 

> E e o [log fe 0 ,^,T n ,z n (X)] - Eg 0 [log/e l(x ,^T n , Zn) ( X ) 

fdo,X n ,T n ,Z n (X ) \ 


= Eg o log 


6i(X,x n ,T n ,z rl ) 


m 


h 


E e 1 (X,x n ,T n ,z n )\0o E X\9 1 (X,x n ,T n ,z n )=<p,0 o ^ log 

= E0 1 (X,X n ,T n ,Zn)\Oo E '(f0°’ Xn ’ Tn ’ Z m fip, xn , T n, z n) 

- £, 0i(X,x™,T„, Zrl )|0o^ g “ 1 f (e) £(f0o,x n ,T n ,z„, fr,,x n ,T n ,z n ) 

= Eg 1 (X,X n ,T n ,Zn)\Oo E '^ 0 Q’ Xn ’ Tn ’Zn-> fvn’ xn > T n,Zn) 

> e, 


f0 o ,x n ,T n ,z„(X) 


Oi(X,x n ,T n ,z n )=ip 


} (X) , 


( 6 . 10 ) 


where rj* = argmin IC(fg 0 , x n iTntZnl f v , x n tTntZn ) € A n (e), due to compactness of A n (e). Hence, (EH) 

»jeA„(6) 

is satisfied in our non-nd SDE set-up. 


6.3 Verification of (15.71) 

From Lemma flOl it follows that E jlog y.j j exists and is continuous in 0\, x, T and 2 . Then 


compactness of 0, X, T and Z ensures (15.71) . 


6.4 Verification of (15.81) 

For the non-rid case, the following identity holds: 

7n+1 fn+1 (X n -\-1 ) 

In /o,n+ l (X n -\-{ ) 

_ 'f x n1 ^ n +1 , -Z re +1 (Vn+l) 

/fl 0 ,I n+1 ,rn+l,Zn+l (Vn+l)’ 


where 


/x n+1 ,T n+ i,z„ + i(') Eg i|Xi,...,.Y„ [/0i,x n + 1 ,T„+i,z n +i(')] 


( 6 . 12 ) 


is the posterior predictive distribution of fg ljX n+i,T n+1 , Zn+1 (')> will 1 respect to the posterior of 0\, given 
A'l . X n . 


v 


Now since log fg 0 , x »+i,T n+1 ,z n+1 i x n+i) = U x n+i tTn+ltZn+ 1 ,e 0 ~ xn+1 ’ Tn + 1 '‘‘ n+1 - e ° , using Lemma 
[l0]and compactness of 0, X, T and Z it is easy to see that the moments of log fg 0jX n + 1 ,T n+1 ,z, . +1 (V n+ t) 
are uniformly bounded above. So, we have 


SWpE(\ogfg 0 tX n+l iTn+ 1 }Zn+ 1 (X n+1 )) 2 < OO 


(6.13) 
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As in the iid case, here also we have 


/^(X„ +1 ,x"+i,T n+1 ,z„ +1 )(-^n+l) - /0 1 ,x«+LT n+1 , Zn+1 p£n+l) 

— /x 7l + 1 ,T n _|_i,z rl _|_i (-^n+1) 

< sup fe ltX n+i t T n+ ltZ (X n+1 ) = fei*(X n+1 ,x”+\T n+1 ,z n+1 )(X n+ i), 

0 i e© 


where 0l(X n+ 1 ,x n+ 1 ,T n+ 1 ,z n+ i) = argmin f 01 >x n+i tT z (X n+1 ) € 0 and 0**{X n+1 , x n+1 , T n+ll z n+1 ) 

0 ig© 

argmax f 9 , x n+i T • z (X n+1 ) G 0. Note that each component of 0\(X n+ i, x n+1 , T n+1 , z n+1 ) 

0 ; ( (-) 

and 0J*(X n+ i, x n+1 , T n+ i, z n+1 ) depends on X n+1 , x n+1 , T n+1 , z n+1 . 

It follows, as in the iid case, that 


U, 


0I,x"+ 1 ,T n+ i,z n+ i 


^ L7j* T n+1 T i i r ii 


_ ^,x"+ 1 ,r„+i,x„ +1 

2 

^05>»+ 1 ,T n+1 , Z „ +1 


< fog f x n+l jTn+UZn+1 (X n+1 ) 

Voi*,x n + 1 ,T ri+1 ,z n+1 


^ Uft** ^n+l T 1 , i y 
— C7i ,3; ,-t n+1 j^n+l 


< 


U, 


a** ^n+ 1 T 1 ! i ^ i | 
” 1 )•£ n+1 j^n+l 


+ 


V#** n+1 T , t ^ i -i 


Proceeding in the same way as in the iid case, and exploiting Lemma ITOl we obtain 


sup E (\og f xn +i tTn+ ^z n+ 1 {X n+ i)} 


< oo. 


(6.14) 


Thus, as in the iid set-up, (15.81 ) follows from (16.131) and (16.141 ). 

We formalize the above arguments in the form of a theorem in our non -iid SDE set-up. 

Theorem 12 Assume the non-iid SDE set-up and conditions (HI') - (H 6 '). Then ( 15.91) holds. 


As in the previous cases, the following corollary provides asymptotic comparison between two models 
using Bayes factor in the non-iid SDE set-up. 

ft) i t (-X"i) 

Corollary 13 For j = 1,2, let Rj n (0j) = nr=t 7 1 ’ l ' Z \x ) ’ where 0 \ and 62 are two different 

Je 0 ,x t ,T i ,zA x ' 

finite sets of parameters, perhaps with different dimensionalities, associated with the two models to be 
compared. For j = 1,2, let 

Ij„ = J Rj„ 

where ttj is the prior on 0j. Let B n = Iin/ffn os before. Assume the non-iid SDE set-up and suppose 
that both the models satisfy (HI') - (H6'), and have the Kullback-Leibler property with 5 = 5\ and 
5 = 62 respectively. Then 

n l log B n ->62-81, 


almost surely. 
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7 Simulation studies 


7.1 Covariate selection when n — 15, T — 1 

We demonstrate with simulation study the finite sample analogue of Bayes factor analysis as n —>• oo 
and T is fixed. In this regard, we consider n = 15 individuals, where the i-th one is modeled by 

dXi(t) = (6 + &zi(t) + &z 2 (t) + Uz 3 (i))(& + Xi(t))dt + <JidWi(t), (7.1) 

for i = 1, • • • ,15. We fix our diffusion coefficients as a l+ \ = <7j + 5 for i = 1 • • • ,14 where o\ = 10. 
We consider the initial value X(0) = 0 and the time interval [0, T] with T = 1. 

To achieve numerical stability of the marginal likelihood corresponding to each data we choose the 
true values of i = 1,..., 6 as follows: 0.001 2 ), where /q 1V(0,1). This is not to be 

interpreted as the prior; this is just a means to set the true values of the parameters of the data-generating 
model. 

We assume that the time dependent covariates Zi(t) satisfy the following SDEs 

dz\{t) =(0i + 9oZi{t))dt + dWi{t) 
dz 2 (t) =9 3 dt + dW 2 (t) 

dz 3 (t) =6iZ 3 (t))dt + dW 3 (t), (7.2) 

where Wj(-); * = 1,2,3, are independent Wiener processes, and 0* ~ 1V(0, 0.01 2 ) for * = 1, ■ • • , 4. 

We obtain the covariates by first simulating 0,- iV(0, 0.01 2 ) for * = 1, • • • , 4, fixing the values, 

and then by simulating the covariates using the SDEs (17.21) by discretizing the time interval [0,1] into 
500 equispaced time points. In all our applications we have standardized the covariates over time so that 
they have zero means and unit variances. 

Once the covariates are thus obtained, we assume that the data are generated from the (true) model 
where all the covariates are present. For the true values of the parameters, we simulated (£i,... ,£g) 
from the prior and treated the obtained values as the true set of parameters 6 q. We then generated the 
data using (17.lb by discretizing the time interval [0,1] into 500 equispaced time points. 

As we have three covariates so we will have 2 3 = 8 different models. Denoting a model by the 
presence and absence of the respective covariates, it then is the case that (1,1,1) is the true, data- 
generating model, while (0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), and (1,1,0) are the 
other 7 possible models. 


7.1.1 Case 1: the true parameter set 0 {) is fixed 
Prior on 6 


For the prior 7r on 0, we first obtain the maximum likelihood estimator ( MLE ) of 6 using simulated 
annealing (see, for example, Liu ( 20010 , Robert and Casella ( 2004 |)), and consider a normal prior where 
the mean is the MLE of ^ for * = 1,... ,6 and the variance is 0.8 2 If;, Ig being the 6-dimensional 
identity matrix. As will be seen, this results in consistent model selection using Bayes factor. 

Form of the Bayes factor 

In this case the related Bayes factor has the form 


In 



fi,e 0 (Xi) 


n(dOi), 


where 0 O = (£o,i, £ 0 , 2 , £ 0 , 3 , Co, 4 , £ 0 , 5 , £o,e) is the true parameter set and Oi = (£ 1 ,6,6, U, &>,&) is 

the unknown set of parameters corresponding to any other model. Table 17. ll describes the results of our 
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Bayes factor analyses. It is clear from the 7 values of the table that the correct model (1,1,1) is always 


Table 7.1: Bayes factor results 


Model 

iTs log ho 

(0,0,0) 

-3.25214 

(0,0,1) 

-1.39209 

(0,1,0) 

-3.31954 

(0,1,1) 

-1.11729 

(1,0,0) 

-3.40378 

(1,0,1) 

-1.22529 

(1,1,0) 

-3.46790 


preferred. 


7.1.2 Case 2: the parameter set do is random and has the prior distribution it 

We consider the same form of the prior ir as in Section 17.1.11 but with variance 0.1 2 I (j . The smaller 
variance compared to that in Case 1 attempts to somewhat compensate, in essence, for the lack of 
precise information about the true parameter values. 

In this case we calculate the marginal log-likelihood of the 8 possible models as 


ti = — log 
15 


/ n 

IJ AeiPQM^i); * = 8, 

i =1 


with corresponding to the true model. Table [7^21 shows that £$ is the highest. This clearly implies that 
the Bayes factor consistently selects the correct set of covariates even though the parameters of the true 
model are not fixed. 


Table 7.2: Values of JE x marginal log-likelihoods 


Model 

li 

(0,0,0) 

2.42430 

(0,0,1) 

4.29608 

(0,1,0) 

1.75213 

(0,1,1) 

4.84717 

(1,0,0) 

1.56242 

(1,0,1) 

4.92628 

(1,1,0) 

0.47111 

(1,1,1) 

5.84665 (true model) 


8 Summary and conclusion 

In this article we have investigated the asymptotic theory of Bayes factors when the models are asso¬ 
ciated with systems of SDE 's consisting of sets of time-dependent covariates. The model selection 
problem we consider encompasses appropriate selection of a subset of covariates, as well as appropriate 
selection of the part of the drift function that does not involve covariates. Such an undertaking, according 
to our knowledge, is a first-time effort which did not hitherto take place in the literature. 

We have established almost sure exponential convergence of the Bayes factor when the time domains 
remain bounded but the number of individuals tend to infinity, in both iid and non -iid cases. In the non- 
iid context, we proposed and proved general results on Bayes factor asymptotics, which should be of 
independent interest. 
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Our simulation studies demonstrate that Bayes factor is a reliable criterion even in non-asymptotic 
situations for capturing the correct set of covariates in our SDE set-ups. 

Note that our theory for non -iid situations readily extends to model comparison problems when one 
of the models is associated with an iid system of SDE 's and another with a non -iid system of SDE' s. 
For instance, if the true model is associated with an iid system, then /o, = f 0 = fg 0 , and the rest of 
the theory remains the same as our non -iid theory of Bayes factors. The case when the other model is 
associated with an iid system is analogous. 
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Supplementary Material 


Throughout, we refer to our main manuscript Maitra and Bhattacharva (2 016 a) as MB. 


S-l Proof of Lemma 1 of MB 

We first consider k = 1. Note that due to assumption (H4'), 


Eo 0 (V h0l ) < T t K r3i ( sup 4%* (s) ] ( 1 + sup E 0O (X?(s)) + H/3J 2 ] < oo, 
\ae[o,Tj] J J \ se[o,Tj ) 


since, by Proposition 1 of Delattre et al. (2013), sup Eg 0 (Xf(s)) < oo, for £ > 1, and since 

sE[ 0 ,Tj] 

( sup (s) is bounded above due to continuity of gi; l = 1,... ,p. Hence, (3.6) of MB holds. 

se[o,Ti] J 

Now observe that due to Cauchy-Schwartz and (H4 / ) 


E 0O {V t) g (h 0 :i ) = 

Jo 


h 0 0 I 


Si 

* _ 

VI 

1-1 

fcq 

o 

<T { 

sup 

^sG[0,Tj] 


o 

X 

< oo, 



° 2 (Xi(s)) 

* 2 (Xi(s )) , 


ds 


E e 0 


a^Us)) 


ds 


sG[0,Ti] 


xK L\ SU P ^ 0 ( s ) 1+ SU P E Oo{X?(s))+ \\(3 0 \\ 


i sG[0,Ti 


sG[0,Ti] 


\se[o,Tj] ' SJ J 

holds. Also note that since Eg 0 = Eg 0 (V t .o 0 .o :i ) by (2.11) and (2.13), (3.5) is implied by (3.7) 

of MB. To see that the moments are continuous in 6 \, let j 0 ( "‘ * 1 1 be a sequence converging to 0 \ 

l J m= 1 

as m -> oo. Due to (H3')> 

(P 2 Am)(s)b 2 (m) {Xi(s)) (s)&| (Xj(s)) 

Z 5S1 Pi l >4l Pi 


a^X^s)) 


a 2 (Ms)) 


and 


(s)^( m) (X / (s))6 /3o (2fi(s)) 0 .^(s)^ o (s) 6 ^ i (X i (s)) 6 / 3 o (X i (s)) 


a 2 {X t {s)) 


<T 2 {X t {s)) 


^ 2 ,(m)( S )^(m) W( S )) 


i £' ' /3' 

as m —> oo, for any given sample path (3Q(s) : s € [0, T,]}. Assumption (H4') implies that ——— ^ x\ s )) 

is dominated by sup (s) x sup Kp 1 1+ sup [Aj(s)] 2 + sup \\/3i \\ 2 . Since Aj(,s) 

CjGr.selO,Ti] ,?1 ^G® V sG [0,Tj] _ ^G® J 

is continuous on [0,1)], (guaranteed by (H2 7 ); see Delattre et «/.l ( 201 3h ). it follows that f E pQ(s)] 2 ds < 
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oo, which, in turn guarantees, in conjunction with compactness of 53 and T. that the upper bound is in- 
tegrable. Hence, V. g ( m ) —> I j 0 1 , almost surely. Now, for all rn > 1, 


Y,ej m) < Ti 


( sup 

£1 sr ,s€[o,Td 



x 


sup Kp 


X 


1 + sup 

y 5G[0,Tj] 


[2Q(s)] 2 + sup 
/3 iGB 



Since Eg, ( sup [2Q(s)] 2 ) < oo by (3.1) of MB, it follows that Eg, (v. ( m )) —> Eg 0 Y as 
\sG[0,Tj] J v *’ Wl J V - V 


iH 


0 1 j. Hence, Eg 0 (VgeJ is continuous in 6 


4*. Am) ^ (m) (-^i( 5 )) 


■i £ 

In the case of Vi^g, ) g 1 , the relevant quantity —— 
continuous function (hence integrable on [0, T*]) 


j£i_ 


is dominated by the 


- 2 

sup < / > i, 4 1 ('S) x sup Kg X 1+ sup [2Q(s)] + sup ||/3 1 

t . /-To /n 1 i 1 , . rn 1 /3 cvy 


, €ier,se[o,Tj] 


,/3lGB 


5G [0,Ti] 


/3 ieB 


x sup |^4 0 (s)| x iTj 1+ sup [Xj(s)] 2 + ||/3 


t SG[0,Tj] 


sG[0,Ti] 


2 

Oil ^ 


which ensures V. Q 0 ( m ) — > V i 0q 0i , almost surely. Using the above bound for 


. ^(m) (*<«)**> (-^( S )) 

—- i(X,(s)) -’ il 18 Seen tIlat 


V l ,e 0 ,e[ m) < T * Al 


sup [Xi(s )] 4 + K 2 sup 

ysG[0,Ti] sG[0,7Y] 


PQ(s)] 2 + K, 


for appropriate positive constants K \, if2, K >, so that (3.1) of MB for k = 4, guarantees that Eg 0 ( V. 

Eg 0 (v i Oo ^ , as 9\ m> —» 0\. This shows that Eg 0 is continuous as well. Since Eg 0 {Ui t g x ) = 

Eg 0 it follows that Eg 0 (C/geJ is continuous in 6 \. 

We now consider k >2. Note that, due to (H4'), and the inequality (a + b) k < 2 k 1 (|a| fc + |6| fc ) for 
k > 2 and any a, b. 


Eg 0 (V^) < sup U u (s)\ 2 k ~ 1 T k K k (l + \\^\\ 2 ) 

Vse[°,Ti] / 3 


2\ k 


+ sup 

\ s€[0,Ti] 


hW) ^TfK^E sup [*<(«)] 


2 k 


, sG[0,Tj] 


Since E ( sup [Xj(s)] 2fc ] < 00 due to (3.1) of MB, and because Kp , ||/3 j| are continuous in 

\se[o W J 

compact 53, and ( sup 4>i,£ (s) ) i s continuous in compact T, it holds that Eg 0 (Vi_g.) k < 00. In a 

V*e[o,T i ] ' 3 J 

similar manner it can be shown that Eg 0 (V r g (l g l ) k < 00. Thus, (3.7) of MB follows. 

To see that (3.5) of MB holds, note that, due to (2.11) of MB and (a + b) k < 2 fc ~ 1 (|a| fc + |6| fc ), 


4/oHp (\ k 1 ofe-lrn / f Tl (^( S )) 


Eg, (U it g.) K < 2 k ~ l E e , (V Mj y + 2 k ~ x E, 


Oo 


1 0 °(Xi(s)) 


dWi(s) . (S-l.l) 
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Since, due to (H4'), (3.1) of MB and continuity of 4>i,^ on compact spaces, 


E, 


Oo 


(pi,t (s)bp (Xi(s)) 


a(Xi(s)) 


ds < oo, 


Theorem 7.1 of Mao (2011) (page 39) shows that 


E { 


e 0 


<r(Xi(s)) 


-dW(s) 


‘ <,M 


rTi 




o-(Xi(s)) 


ds 


fc (S-1.2) 

Combining (IS-l.ll) with (IS-1.21) and the result Eg 0 (V,.o„.o, f < oo, it follows that Eg 0 {Uifij) < oo. 

As regards continuity of the moments for k > 2, first note that in the context of k = 1, we have 
shown almost sure continuity of Vi : g 1 with respect to 9\. Hence, V k 0 , is almost surely continuous with 

respect to 6 \. That is, oJ n ‘ —> 0\ implies V k 


i,e ( 1 rn '> 


V k ~ , almost surely. Once again, dominated 
2,6/1 


convergence theorem allows us to conclude that Eg 0 g (™Jj —>■ Eg 0 0 j ", implying continuity 

°f Eg Q with respect to 6 \. Similarly, it is easy to see that Eg 0 (V, g (} g l ) k is continuous with 

respect to 0 \. To see continuity of Eg 0 (U 1 g l ) k , first note that 


E, 


Oo 


l 


Ti <t>i i c 1 (s)b 0 i (X i (s))' 


v( x i(s)) 




ds 


0, 


as m —>• oo. The result follows as before by first noting pointwise convergence, and then using (H4'j 
and then (3.1) of MB, along with (HI') and boundedness of 4>. ( m ). By Ito isometry it holds that 

*!S1 


E, 


Oo 


l 


Ti rTi ^(X^s)) 


o(Xi(s)) 


-dWi(s) — f 
Jo 


Hence, 


[Ti <!>■ +(m){s)b (m)(Xi(s)) r 

/ —- TTFT^ - dWi(s )-»■ 

Jo Jo 


<r(Xi(s)) 

Ti ti&WpS X i(s)) 


dWiis) 


dWi(s) 


1 0 <r( x i(s)) Jo o-pC(s)) 

in probability, as m —> oo. Since V. () fl ( m ) —>• V- Qq ^ almost surely as m —> oo, it follows from 
(2.11) of MB that U. e ( m ) —»• U i g i in probability, so that U k ^ rn) —> U k ~ in probability. Using 
(H4'), (3.1) of MB and (Hi'), it is easily seen, using the same methods associated with (IS-l.ll) and 

< 00, proving that U k ^ (rn) is uniformly integrable. Hence, 


(IS-1.21 ). that sup Eg 0 (U. ( m ) 

m, \ ,C7 l 


2 k 


m= 1 


Eg 0 (jj. 0 ( m )^ —> Eg 0 (lJ i . In other words, Eg 0 (U^g^ is continuous in 0\. 


S-2 Proof of Theorem 7 of MB 

Let us consider the martingale sequence 

N 

Sn = 5>g(V4-i) + /C(/ 0n , /„)], 

n= 1 


23 

































which is a martingale because E[log(I n /I n -i)\Xi, X2, ■ ■ ■, X n -i ] = —K.(fo n , fn)- Using the above it 
can be verified that if (5.8) of MB holds, implying 


OO 

n~ 2 Var 

n= 1 



In ' 
In— 1 . 


< OO, 


then Sn /N 0 almost surely. Therefore 

N 

N - 1 log I N + N - 1 Y, Won, U) ^ 0, (S-2.1) 

n =1 


almost surely, as N — > 00. 

Now consider A r_1 YliLi l°g ■ If (5-7) of MB holds, implying 


yy i 2 Var 


2—1 


log 


fi{Xi ) J 


< OO, 


then by Kolmogorov’s strong law of large numbers in the independent but non-identical case, 


N 


1 


N 


2=1 


fi(Xi 


almost surely, as N —>• 00. Let A/q(c) = {/ : 6 < /C°°(/o, /)<<) + c}, where c > 0. Now, note that. 


nL /.(.v) 

nli/oiW; 

/ tv 


/at = 


> 


-7r(4f) 


AA/o(c) 


exp 


E lo g 


fi(Xi 


K 2=1 


/oi(*i 


JV 


'Mo(c) 


exp 


-E lo § 


2=1 


foi (Xj 
fi{Xi) 


n(df) 

| tt (df). 


By Jensen’s inequality, 


* log - - jL * (g log im) ^ <s ' 22) 

The integrand on the right hand side converges to /C°°(/o, /), pointwise for every /, given any sequence 
(X 4 }-i associated with the complement of some null set. Since, for all such sequences, uniform 
integrability of the integrand is guaranteed by (5.7) of MB, it follows that the right hand side of (IS-2.21) 
converges to — ^. /C°°(/o, f)ir(df) almost surely. Hence, almost surely, 

lim inf N- 1 log I N > - [ f)ir(df) 

N JNo{c) 

> -((5 + c) 7 T (Mq(c)) 

> —(<5 + c). 

Since c > 0 is arbitrary, it follows that 


lim inf N - 1 log I N > -5, (S-2.3) 

N 
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almost surely. Now, due to (5.2) of MB it follows that JC(fo n , f n ) > 5 for all n with probability 1, so 
that JC(fon, fn) = £{fon, f n ,A n (5 ))> where A n (S) is given by (5.5) of MB. By the Q* property it implies 
that 

N 

liminf iV _1 ^ K (/ 0n , f n ) > 5. 

n =1 

Hence, it follows from (1S-2.1I) that 


lirri sup N 1 loglN<—5. (S-2.4) 

N 

Combining (IS-2.31) and (1S-2.4I) it follows that 

lim N~ 1 loglN = —5 1 

N—> oo 


almost surely. 


S-3 Proof of Lemma 10 of MB 


The proofs of (6.3) - (6.5) of MB follow in the same way as the proofs of (3.5) - (3.7) of MB, using 
compactness of X, T and Z in addition to that of 03 and T 

For the proofs of continuity of the moments, note that as in the iid case, uniform integrability is 
ensured by (H4'), (3.1) of MB and compactness^ of the sets 03, T. X, T and Z. The rest of the proof is 
almost the same as the proof of Theorem 5 of Maitra and Bhattacharya (2016c). 


S-4 Proof of Lemma 11 of MB 

For notational simplicity, let 

cr 2 (Xj(s,X*)) ’ 
bfy (Xi(s, x i ))b l g 0 (X i (s, x 1 )) 
cj 2 (Xj(s,a: i )) 

a 2 (X i (s,x i )) ' 

Continuity of K(fe 0 , x ,T,zi foi,x,T,z) with respect to x and T, the fact that x k -A- x°° and 7). —y T°° 


Vx\(3 0 ( s ) 

% i So,^ 3 ') = 

hx i ,/3 1 (®) 
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as k —>• oo, assumption (H5'), and the dominated convergence theorem together ensure that 


lim 

n—>• oo 


X/fc=l E(f6 0 ,x k ,T k ,Zk'< f()i,x k ,T k , 




n 


1 V- f Tk f fa&isf 


lim — 


n—>• oo u 


Ee 0 0^x k ,/3 o {$)) 


k =1 ’ 


-fa&Wk&MEe 0 (V xkM (s)) + 4 ’^ (g) E eo (V xktPi (s))\ds 


= lim 

n—roo n J o 


1 f Tk ST' / (CoO + CIOS'!(^fcl(' s )) + ’ ’ ’ + £,pOdp(Zkp(s))y 


E v 

fe=i k 


E 0o (V xk ,p o (s)) 


(Coo T ClOSl ( z kl ( s )) + • • • + Cpo5'p(~fcp( s )))(Coi T" Cll3l(2fcl(s)) T • • • + ^pigp(zkp{s)))Eg Q (V x k ,f3 0 ,/3 1 ( s )) 
. (Coi + Cn3i(^fci(s)) H-1- Cpi9p(zkp(s))) 2 




v ^ P P \ 

-^ + Coo ^ CzoQ (s) + 2 EE CtoCraOQ(s)Cm(s) j Eg 0 (V x oo p Q (^s)) 

1=1 1=1 m= 1 / 

( P p p p \ 

CooCoi +CooECn^(-) + Cot ^Czoq('S) + EE CzoCmlQ ('S)Cm('S) | Eg 0 (V x °° 1 f3 0 ^() 1 (s)) 

i=l Z=1 Z=1 m=l / 

(F 2 P l p p \ I 

+ ( -^- + Cot ^ CuQ («) + 2 EE CziCmlQ(s)c m (s) j Eg 0 (V x oo j/3l (s)) > rfs, 

V /=! w Z=1 m=l / J 


(S-4.1) 


which is the Kullback-Leibler distance between the models of the same form as fo 0 . x .T,z and fo 1 , x ,T,z 
but with x, T and gi(zi(s)) replaced with x°°, T°° and q(s). 
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